Goto

Collaborating Authors

 communication-efficient algorithm



A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Neural Information Processing Systems

In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT).


A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Neural Information Processing Systems

In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence.


Communication-Efficient Algorithms for Statistical Optimization John C. Duchi

Neural Information Processing Systems

We study two communication-efficient algorithms for distributed statistical optimization on large-scale data. The first algorithm is an averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates.


A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning

Lin, Yixuan, Zhang, Kaiqing, Yang, Zhuoran, Wang, Zhaoran, Başar, Tamer, Sandhu, Romeil, Liu, Ji

arXiv.org Machine Learning

Recently, there has been increasing interest in developing distributed machine learning algorithms. Notable examples include distributed linear regression [1], multi-arm bandit [2], reinforcement learning (RL) [3], and deep learning [4]. Such algorithms have promising applications in large-scale networks, such as social platforms, online economic networks, cyber-physical systems, and Internet of Things, primarily because in such a complex network, it is impossible to collect all the information at the same point and each component of the network may not be willing to share its private information due to privacy issues. Multi-agent reinforcement learning (MARL) problems have recently received increasing attention. In general, MARL problems are investigated in settings that are either collaborative, competitive, or a mixture of the two. For collaborative MARL, the most rudimentary framework is the canonical multi-agent Markov decision process [5, 6], where the agents share a common reward function that is determined by the joint actions of all agents. Another notable framework for collaborative MARL is the team Markov game model, also with a shared reward function among agents [7, 8]. These two frameworks were then extended to the setting where agents are allowed to have heterogeneous reward functions[3,9-12], collaborating with the goal of maximizing the long-term return corresponding to the team averaged reward.


Communication-efficient Algorithm for Distributed Sparse Learning via Two-way Truncation

Ren, Jineng, Haupt, Jarvis

arXiv.org Machine Learning

We propose a communicationally and computationally efficient algorithm for high-dimensional distributed sparse learning. At each iteration, local machines compute the gradient on local data and the master machine solves one shifted $l_1$ regularized minimization problem. The communication cost is reduced from constant times of the dimension number for the state-of-the-art algorithm to constant times of the sparsity number via Two-way Truncation procedure. Theoretically, we prove that the estimation error of the proposed algorithm decreases exponentially and matches that of the centralized method under mild assumptions. Extensive experiments on both simulated data and real data verify that the proposed algorithm is efficient and has performance comparable with the centralized method on solving high-dimensional sparse learning problems.


Comunication-Efficient Algorithms for Statistical Optimization

Zhang, Yuchen, Duchi, John C., Wainwright, Martin

arXiv.org Machine Learning

We analyze two communication-efficient algorithms for distributed statistical optimization on large-scale data sets. The first algorithm is a standard averaging method that distributes the $N$ data samples evenly to $\nummac$ machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as $\order(N^{-1}+(N/m)^{-2})$. Whenever $m \le \sqrt{N}$, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all $\totalnumobs$ samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as $\order(N^{-1} + (N/m)^{-3})$, and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as $O(N^{-1} + (N/ m)^{-3/2})$, easing computation at the expense of penalties in the rate of convergence. We also provide experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with $N \approx 2.4 \times 10^8$ samples and $d \approx 740,000$ covariates.


Communication-Efficient Algorithms for Statistical Optimization

Zhang, Yuchen, Wainwright, Martin J., Duchi, John C.

Neural Information Processing Systems

We study two communication-efficient algorithms for distributed statistical optimization on large-scale data. The first algorithm is an averaging method that distributes the $N$ data samples evenly to $m$ machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as $\order(N^{-1}+(N/m)^{-2})$. Whenever $m \le \sqrt{N}$, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all $N$ samples. The second algorithm is a novel method, based on an appropriate form of the bootstrap. Requiring only a single round of communication, it has mean-squared error that decays as $\order(N^{-1}+(N/m)^{-3})$, and so is more robust to the amount of parallelization. We complement our theoretical results with experiments on large-scale problems from the Microsoft Learning to Rank dataset.